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Abstract 

An important issue raised by Efron [7] in the context of large-scale multiple 
comparisons is that in many applications the usual assumption that the null 
distribution is known is incorrect, and seemingly negligible differences in the 
null may result in large differences in subsequent studies. This suggests that 
a careful study of estimation of the null is indispensable. 

In this paper, we consider the problem of estimating a null normal distri- 
bution, and a closely related problem, estimation of the proportion of non-null 
effects. We develop an approach based on the empirical characteristic function 
and Fourier analysis. The estimators are shown to be uniformly consistent 
over a wide class of parameters. Numerical performance of the estimators is 
investigated using both simulated and real data. In particular, we apply our 
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procedure to the analysis of breast cancer and HIV microarray data sets. The 
estimators perform favorably in comparison to existing methods. 
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1 Introduction 

The analysis of massive data sets now commonly arising in scientific investigations 
poses many statistical challenges not present in smaller scale studies. One such 
challenge is the need for large-scale simultaneous testing or multiple comparisons, 
in which thousands or even millions of hypotheses are tested simultaneously. In 
this setting, one considers a large number of null hypotheses Hi, H2, . . . , Hn, and is 
interested in determining which hypotheses are true and which are not. Associated 
with each hypothesis is a test statistic. When Hj is true, the test statistic Xj has a 
null distribution function (d.f.) Fq. That is, 

{Xj\Hj is true) ~ Fq. 

Since the pioneering work of Benjamini and Hochberg [2], which introduced the False 
Discovery Rate (FDR)-controlling procedures, research on large-scale simultaneous 
testing has been very active. See, for example, PQHJ El El El 1101 [13 [12] • 

FDR procedures are based on the p-values, which measure the tail probability 
of the null distribution. Conventionally the null distribution is always assumed to 
be known. However, somewhat surprisingly, Efron pointed out in [7] that in many 
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applications such an assumption would be incorrect. Efron [7J studied a data set 
on breast cancer, in which a gene microarray was generated for each patient in two 
groups, BRCAl group and BRCA2 group. The goal was to determine which genes 
were differentially expressed between the two groups. For each gene, a p-value was 
calculated using the classical t-test. For convenience Efron chose to work on the 
2;-scale through the transformation Xj = where $ = 1 — $ is the survival 

function of the standard normal distribution. Efron argued that, though theoreti- 
cally the null distribution should be the standard normal, empirically another null 
distribution (which Efron referred to as the empirical null) is found to be more ap- 
propriate. In fact, he found that A^(— 0.02, 1.58^) is a more appropriate null than 
A^(0, 1); see Figure [H A similar phenomenon is also found in the analysis of a 
microarray data set on HIV [7]. 




Figure 1: 2;- values of microarray data on breast cancer. Left panel: QQ-plot. Right 
panel: histogram and density curves of A^(0, 1) (dashed) and A^(— 0.02, 1.58^). The 
plot suggests that the null is A^(-0.02, 1.58^) rather than A^(0, 1). See Efron [7] for 
further details. 

Different choices of the null distribution can give substantially different outcomes 
in simultaneous multiple testing. Even a seemingly negligible estimation error of the 
null may result in large differences in subsequent studies. For illustration, we carried 
out an experiment which contains 100 independent cycles of simulations. In each 
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cycle, 9000 samples are drawn from A^(0, 0.95^) to represent the null effects, and 
1000 samples are drawn from A^(2, 0.95^) to represent the non-null effects. For each 
sample element Xj, p-values are calculated as <l^^(Xj/0.95) and <l~^(Xj), which 
represent the p-values under the true null and the misspecified null, respectively. 
The FDR procedure is then applied to both sets of values, where the FDR control 
parameter is set at 0.05. The results, reported in Figure [21 show that the true posi- 
tives obtained by using A^(0, 1) as the null and those obtained by using A^(0, 0.95^) as 
the null are considerably different. This, together with Efron's arguments, suggests 
that a careful study on estimating the null is indispensable. 
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Figure 2: The solid and dashed curves represent the number of true positives for each 
cycle, using the true null and the misspecified null, respectively. For visualization, 
the numbers are sorted ascendingly with respect to those in the true null case. 

Efron [7] introduced a method for estimating the null distribution based on the 
notion of "sparsity." There are several different ways to define sparsity [1]. The most 
intuitive one is that the proportion of non-null effects is small. In some applications, 
the case of "asymptotically vanishing sparsity" is of particular interest [H |6] . This 
case refers to the situation where the proportion of non-null effects tends to zero 
as the number of hypotheses grows to infinity. In such a setting, heuristically, the 
influence of the non-null effects becomes more and more negligible and so the null can 
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be reliably estimated asymptotically. In fact, Efron [7J suggested an approach which 
uses the center and half width of the central peak of the histogram for estimating 
the parameters of the null distribution. 

In many applications it is more appropriate to model the setting as non-sparse, 
i.e., the proportion of non-null effects does not tend to zero when the number of 
hypotheses grows to infinity. In such settings, Efron's approach [7J does not per- 
form well, and it is not hard to show that the estimators of the null are generally 
inconsistent. Moreover, even when the setting is asymptotically vanishingly sparse 
and the estimators are consistent, it is still of interest to quantify the influence of 
sparsity on the estimators, as a small error in the null may propagate to large errors 
in subsequent studies. 

Conventional methods for estimating the null parameters are based on either mo- 
ments or extreme observations [3,1171 ED]- However, in the non-sparse case, neither 
is very informative as the relevant information about the null is highly distorted by 
the non-null effects in both of them. In this paper, we propose a new approach for 
estimating the null parameters by using the empirical characteristic function and 
Fourier analysis as the main tools. The approach demonstrates that the information 
about the null is well preserved in the high frequency Fourier coefficients, where the 
distortion of the non-null effects is asymptotically negligible. The approach inte- 
grates the strength of several factors, including sparsity and heteroscedasticity, and 
provides good estimates of the null in a much broader range of situations than ex- 
isting approaches do. The resulting estimators are shown to be uniformly consistent 
over a wide class of parameters and outperform existing methods in simulations. 

Beside the null distribution, the proportion of non-null effects is an important 
quantity. For example, the implementation of many recent procedures requires the 
knowledge of both the null and the proportion of non-null effects; see [HI UHl [19] . 
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Developing good estimators for the proportion is a challenging task. Recent work 
includes that of Meinshausen and Rice |T7], Swanepoel [20], Cai et al. [Ij, and 
Jin [13j. In this paper we extend the method of Jin |13j to the current setting of 
heteroscedasticity with an unknown null distribution. The estimator is shown to be 
uniformly consistent over a wide class of parameters. 

In addition to the theoretical properties, numerical performance of the estima- 
tors is investigated using both simulated and real data. In particular, we use our 
procedure to analyze the breast cancer [TT] and HIV [2T] microarray data that were 
analyzed in Efron [7J. The results indicate that our estimated null parameters lead 
to a more reliable identification of differentially expressed genes than that in [7]. 

The paper is organized as follows. In Section [21 after basic notations and defini- 
tions are reviewed, the estimators of the null parameters are defined in Section 12.11 
The theoretical properties of the estimators are investigated in Sections 12.21 and 12.31 
Section 12.41 discusses the extension to dependent data structures. Section [3] treats 
the estimation of the proportion of non-null effects. A simulation study is carried 
out in Section H] to investigate numerical performance. In Section [5l we apply our 
procedure to the analysis of the breast cancer QJJ and HIV [2T] microarray data. 
Section [6] gives proofs of the main theorems. 

2 Estimating the null distribution 

As in Efron [7], we shall work on the 2;-scale and consider n test statistics 

X, ~Ar(/i,-,cr|), l<J<n, (2.1) 
where /ij and cTj are unknown parameters. For a pair of null parameters fiQ and (Tq, 
(/ij, (jj) = (/io, o"o) if -ffj is true and (yUj, ctj) 7^ (/io, ctq) if iJj is untrue, (2.2) 



6 



and we are interested in estimating jjQ and ctq- We shall first consider the case in 
which Xi, . . . , Xn are independent. The dependent case is considered in Section [231 
Set fi = {/ii, . . . , and a = {ai, . . . , cr„}. Denote the proportion of non-null 
effects by 

, („ „^ #{j : (/ij,crj) ^ (/xo,ao)} 

en = enilJ'jCr) = . [2.6) 

n 

We assume aj > cxo for all 1 < j < n. That is, the standard deviation of a non- 
null effect is no less than that of a null effect. This is the case in a wide range of 
applications [3, [15]. To make the null parameters identifiable, we shall assume 

e„(/x,cr) < Co, for some constant < cq < |. (2.4) 



Definition 2.1 Fix cq G (0, 1/2), fiQ, and o"o > 0. We say that (/i, a) is (/io, <Jo, eo)- 
eligible if ^2.4\) is satisfied and ctj > (Tq for all I < j < n. 

Throughout this paper, we assume that (/i, a) is (/io, <Jo, eo)-eligible. 
2.1 Estimating the null parameters 

As mentioned in the Introduction, an informative approach for estimating the null 
distribution is to use the Fourier coefficients at suitable frequencies. In the litera- 
ture, Fourier coefficients have been frequently used for statistical inference; see for 
example [9l [22] . We now use them to construct estimators for the null parameters. 
Introduce the empirical characteristic function 

1 " 

ip„{t) = cpn{t;X,,...,Xn,n) = - Ve^*^^ (2.5) 

n ^-^ 

and its expectation, the characteristic function (p(t) = ip{t] /i, a,n) = ^ X]j=i ~ y 

where i = The characteristic function naturally splits into two components, 
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ip{t) = (po{t) + <^(t), where = Mt; n) = {1 - e„) ■ e*''"*-'^'*'/^ and 

^{t) = ^{t- fx, a, n) = e„ ■ Ave{,, (p„cr,)^i^.o,^o)}W''''~''^''^'}, (2-6) 

which correspond to the null effects and non-null effects, respectively. Note that the 
identifiability condition e„ < eo < 1/2 ensures that Lp{t) ^ for all t. 

We now use the above functions to construct estimators for and fiQ. For any 
t 7^ and any differentiable complex- valued function / such that \f{t)\ ^ 0, we 
define the two functionals 

i\fit)\ , .... Re(/(t))-Im(f(t))-Re(r(t))-Im(/(t)) 

where Re(z) and Im(2;) denote respectively the real and imaginary parts of the 
complex number z. Simple calculus shows that evaluating the functionals at ipo 
gives the exact values of ctq and fiQ-. ctq^lpq] t) = ctq and /iolv^o! t) = /^o for all t ^ 0. 

Inspired by this, we hope that for an appropriately chosen large t, ipn{t) ~ f{t) ~ 
ipo(t), so that the contribution of non-null effects to the empirical characteristic 
function is negligible, which would then give rise to good estimates for ctq and 
/iQ. More specifically, we use (Tg(v?n;^) and fio{ipn;t) as estimators for ctq and /xq, 
respectively, and hope that by choosing an appropriate t, 

al{ipn]t) ^ al{ip]t) ^ al{ipo;t) = al, (2.8) 
/io(v5n;t) ~ /io(v5;t) ^ fio{ipo]t) = fiQ. (2.9) 

There is clearly a tradeoff in the choice of t. As t increases from to cxo, the 
second approximations in (12. 8p and (12. 9p become increasingly accurate, but the 
first approximations become more unstable because the variances of crQ(</9„;t) and 
fJ'oiVn] t) increase with t. Intuitively, we should choose a t such that {pn(t)/{p(t) ^ 1, 
so that if can be estimated with first order accuracy. Note that by the central limit 
theorem, \(pn(t) — '^(t)\ = Op{^), so t should be chosen such that ip(t) ^ 
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We introduce the following method for choosing t, which is adaptive to the 
magnitude of the empirical characteristic function. For a given 7 G (0, 1/2), set 

inil) = Lir, ^n) = inf{t : \^n{t)\ = n-\ < t < logn}. (2.10) 

Once we decide on the frequency t = tn(7), we have the following family of 'plug in' 
estimators which are indexed by 7 G (0, 1/2): 

aQ = al{ipn;in{l)) and fiQ = iiQ{^n]in{l)). (2.11) 

We mention here that it will be shown later in Lemma [6.31 that t„(7) is asymptoti- 
cally equivalent to the non-stochastic quantity 

tnil) = tnir, = inf{t : |<^(t)| = n-\ 0<t< logn}, (2.12) 

and that the stochastic fluctuation of tn(7) is algebraically small and its effect is 
generally negligible. We notice here that by elementary calculus, 

tnil,^) = [V27logn/ao] ■ (1 + 0(1)), (2.13) 

where o(l) tends to uniformly for all ip under consideration. 

2.2 Uniform consistency of the estimators 

We now show that the estimators and fiQ given in (12. lip are consistent uniformly 
over a wide class of parameters. Introduce two non-stochastic bridging quantities, 
(Jq((^; tn(7)) and fioif, tn{l)), which correspond to (Tq and /iq, respectively. For each 
estimator, the estimation error can be decomposed into two components: one is the 
stochastic fluctuation and the other is the difference between the true parameter 
and its corresponding bridging quantity, 

Wli^nJnil)) -fTgl < \al{ipn;in{l)) - CTq (^; t„ (7) ) | + \aQ{ip; tn{l)) - (Tq I > (2-14) 
\fXo{<fn;inil)) -/iol < IfJ^oi'fnJnil)) - /iQ (V^; (t) ) I + ll^oiV^ tnil)) -yUo|- (2.15) 
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We shall consider the behavior of the two components separately. Fix constants 
g > and A > 0, and introduce the set of parameters 

An(g, A; /io, (To, eo) = {(/i, a) is (/iq, ao, eo)-eligible, M^^\ij, a) < A"}, (2.16) 

where mI^\^, a) = Ave{j., {,i,,a,)^{t,o,ao)}{i\fJ'j - /io| + - (t^I^^'^Y}- For a constant 
r, we say that a sequence {an}'^=i is o(n~^) if for any 5 > 0, n^'^^\an\ ^ as n ^ oo. 
The following theorem elaborates the magnitude of the stochastic component. 

Theorem 2.1 Fix constants 7, eo G (0, 1/2), g > 3, and A > 0. As n ^ 00, except 
for an event with probability d{n~'^^), 



sup ICTq 



^(^„;t„(7)) - ao^(¥^;tn(7))l < 3c2 ■ log^/^H ■ n^-^/^ 



{A„{q,A■,^lo,(To,eo)} 



sup 

{A„(g,A;/iO,o-o,eo)} 



|/io(v5n;4(7)) - Aio(v5;^n(7))l < A/27C2 ■ log(n) ■ 



ci = ci(g,7) = < 



where 02 = C2(cro, 9, 7) = 2(Tq ■ a/ max{3, g — 1 — 27}, and 

' (g/2-l-7)/2, g<4, 

(g/2-1-7), 4<g<4 + 27, (2.17) 

(g-l-27)/3, g>4 + 27. 

Theorem 12.11 says that the stochastic components in (12.141) and (I2.15P are both 
algebraically small, uniformly over A„. 

We now consider the non-stochastic components in (I2.14p and (I2.15p . As defined 
in (12. 6p . <^(t) naturally factors into <^(t) = e*'^o*~°"o*^/^ ■ ipit), where 

m = ij{t; /z, a, n) = 6„ ■ Ave|,, (^,,.,)^(;.o,.o)}e'^^^-'^"^*"^''^'"^»^*'/'- (2-18) 

Lemma 16.51 in Section E] tells us that there is a constant C > such that uniformly 
for all (/io, o-Q, eo)-eligible parameters (/i, a), \cr'^{(p]tn{j j)-cr^\ < C- |?/''(t„(7))|/t„(7) 
and |/io(v9; t„(7)) — /^o| < C* ■ |^'(^n(7))|; see details therein. Combining these with 
Theorem 12.11 gives the following theorem, which is proved in Section El 
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Theorem 2.2 Fix constants 7,69 G (0,1/2), g > 3, and A > 0. For all t, 

suP{A„(q,A;/.o,<7o,€o)} ^ ^'^o- Moreover, there is a constant C = C(7, g, A, eo, /^o, ^-q) 

stic/i t/iat, except for an event with algebraically small probability, for any {n, a) G 
A„(g, A; /io, o"o, eo) anc? a// sufficiently large n, 

V V log n J 

|/^o(<^n;4(7))-/^o| < c(^|^'(t„(7))|+logH-n^-i/2^. 

Consequently, al{ipn]in{l)) is uniformly consistent for aQ over A„(g, A; /ig, Uq, eo). 
Additionally, if ip' {tn{'j)) = o(l), i/ien {Xo(Lpn]inil)) is consistent for /xq as well. 

We remark here that fio{(pn',tn{'y)) is uniformly consistent for /io over any subset 
A* C A„ with sup|y\^.|{|'?/''(t„(7))|} = o(l). Although at first glance the convergence 
rates are relatively slow, they are in fact much faster in many situations. 

2.3 Convergence rate: examples and discussions 

We now show that under mild conditions the convergence rates of ctq tn(7)) and 
fio{(pn]in{'y)) can be significantly improved, and sometimes are algebraically fast. 

Example I. Asymptotically vanishing sparsity. Sparsity is a natural phenomenon 
found in many scientific fields such as genomics, astronomy, and image processing. 
As mentioned before, asymptotically vanishing sparsity refers to the case where 
e„(/i, a) (as n — s> 00). Several models for sparsity have been considered in the 
literature, and among them are moderately sparse and very sparse, where e„ = 
for some parameter /? satisfying /? G (0,1/2) and P G (1/2,1), respectively [HE]. 
Lemma [6.51 shows that uniformly over A„, |'?/''(t„(7))| < 0(e„(/i, a)). Theorem 12.21 
then yields the fact that the estimation errors of <jQ{fn', in{l)) and fio{ipn', inil)) are 
algebraically small for both the moderately sparse case and the very sparse case. 
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Example II. Heteroscedasticity. It is natural in many applications to find that 
a non-null effect has an elevated variance. A test statistic consists of two compo- 
nents, signal and noise. An elevation of variance occurs when the signal component 
contributes extra variance. Denote the minimum elevation of the variance for the 
non-null effects by 

r„ = Tnin, a) = minij; (^^.,a,)^(^o,<xo)}{c^J - c^ol- (2-19) 

Lemma [63] shows that \tp' {tn{-f))\ < 0(e„e-^'°s(")^"('^''^)). So ^'(t„(7)) = o(l) if, say, 
> ^°iogn" ' ^'(^n(7)) IS algebraically small if r„ > Cq for some constant Cq > 0. 

Example III. Gaussian hierarchical model. The Gaussian hierarchical model is 
widely used in statistical inference, as well as in microarray analysis; see Efron [7], 
for example. A simple version of the model is where aj = ctq and the means fij 
associated with non-null effects are modeled as samples from a density function h, 
{fij\Hj is untrue) ~ h. It is not hard to show that \ip' (tn{'y))\ < e„ ■ | / e**"^'^)"[(-u — 
fio)h{u)]du\, where the integral is the Fourier transform of the function {u — fio)h{u) 
at frequency tn(7)- By the Riemann-Lebesgue Lemma [16], |'?/''(t„(7))| = o(t~'''(7)) 
if the fc-th derivative of h{u) is absolutely integrable. In particular, if h is Gaussian, 
say N{a,b'^), then \ip' (tn{'y))\ < 0(e„ ■ |^n(7)| ■ n^'^^'^) and is algebraically small. 

We note here that sparsity, heteroscedasticity, and the smoothness of h can 
occur at the same time, which makes the convergence even faster. In a sense, our 
approach combines the strengths of sparsity, heteroscedasticity, and the smoothness 
of the density h. The approach can thus be viewed as an extension of Efron's 
approach, as it is consistent not only in the asymptotically vanishingly sparse case, 
but also in many interesting non-sparse cases. Additionally, in the asymptotically 
vanishingly sparse case, the convergence rates of our estimators can be substantially 
faster than those of Efron. For example, this may occur when the data set is both 
sparse and heteroscedastic. 
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Remark: The theory developed in Sections 12.11 - 12.31 can be naturally extended to 
the Gaussian hierarchical model, which is the Bayesian counterpart of Model (12. ip - 
(12.21) and has been widely used in the literature; see for example [3 HO]. The model 
treats the test statistics Xj as samples from a two-component Gaussian mixture: 

~ (1 - e)N{fio, a^) + eN{f^,, a]), I < J < n, (2.20) 

where (/ij, aj) are samples from a bivariate distribution F{fi, a). The previous results 
can be naturally extended to this model. 

2.4 Extension to dependent data structures 

We now consider the proposed approach for dependent data. As the discussions 
are similar, we focus on crQ((y9; ^^(7)). Recall that the estimation error splits into 
a stochastic component and a non-stochastic component, Ictq (</)„; t„ (7)) — ctq] < 
Woifn'jtni'y)) — crQ((y9; t„(7))| + |(Jq(v9; t„(7)) — (Tq|. Note that the non-stochastic 
component only contains marginal effects and is unrelated to dependence structures. 
We thus need only to study the stochastic component, or to extend Theorem 12. 1[ In 
fact, once Theorem 12. II is extended to the dependent case, the extension of Theorem 
12.21 follows directly by arguments similar to those given in the proof of Theorem 
12. 2[ For reasons of space, we shall focus on two dependent structures: the strongly 
(Q;)-mixing case and the short-range dependent case. Denote the strongly mixing 
coefficients by a{k) = sup|i<4<„} a;(cr(Xs, s < t),a{Xs,s > t + k)), where cr(-) 
is the (T-algebra generated by the random variables specified in the brackets, and 
a(Si,S2) = supi^^gSi.EaeSa} ^ -^2} - P{Ei}P{E2}\ for any two cr-algebras 

Si and S2. In the strongly mixing case, we suppose that a{k) < Bk~'^ for some 
positive constants B and d. In the short-range dependent case, we suppose a{k) = Q 
when k>7V for some constant r G (0, 1). 
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Now, fix constants a>0,i?>0,g>3, and A > 0, introduce the following set 
of parameters which we denote by A„(a, B, q, A) = A„(a, B, q, A; eo, /^o; <^o)- 

{(/i, cr) G A„(g, y4;/io,o-o,eo), max{i<j<„}{|/ij | + \aj\} < B\og°-{n)}. 

Note that this technical condition is not essential and can be relaxed. The following 
theorem treats the strongly mixing case and is proved in [14j Section 7]. 

Theorem 2.3 Fix d > 1.5, g > 3, 7 G (0, ^^), A > 0, a > 0, and B > 0. 

Suppose a{k) < Bk^'^ for all 1 < k < n. As n ^ 00, uniformly for all (/i, cr) G 
A„(a, B, q, A), except for an event with asymptotically vanishing probability, 

An interesting question is whether this result holds for all 7 G (0, 1/2); we leave this 
for future study. The following theorem concerns the short-range dependent case, 
whose proof is similar to that of Theorem 12.31 and is thus omitted. 

Theorem 2.4 Fix q > 3, t e (0,1), 7 G (0,^), A > 0, a > 0, and B > 
0. Suppose a{k) = for all k > . As n ^ 00, uniformly for all {fi,cr) G 
An{a,B,q,A), except for an event with asymptotically vanishing probability, 

Wl{^n;in{l))-(^l{v;tn{l))\ < o{n"''^), |/io (y?„; 4 (7) ) -/iQ tn (7) ) I < o{n^~^). 

We mention that consistency for more general dependent settings is possible pro- 
vided the following two key requirements are satisfied. First, there is an exponential 
type inequality for the tail probability of \ipn(t)—ip(t) \ for all t G (0, logn); we use Ho- 
efTding's inequality in the proof for the independent case, and use [31 Theorem 1.3] in 
the proof of Theorem 12.31 Second, the standard deviation of (pn{tn{l)) has a smaller 
order than that of (y9(t„(7)), so that the approximation 99„(t„(7))/y9(t„(7)) f» 1 is 
accurate to the first order. 
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3 Estimating the proportion of non-null effects 

The development of useful estimator for the proportion of non-null effects, together 
with the corresponding statistical analysis, poses many challenges. Recent work 
includes those of Meinshausen and Rice [17], Swanepoel [20], Cai, et al. [4], and 
Jin IQ\ . See also [H [10] . The first two approaches only provide consistent estima- 
tors under a condition which Genovese and Wasserman call "purity" [lOj. These 
approaches do not perform well in the current setting as the purity condition is not 
satisfied; see Lemma 13.11 for details. Cai et al. [1] largely focuses on a very sparse 
setting, and so a more specific model is needed. Jin PB] considers estimating the 
proportion of nonzero normal means but concentrates on the homoscedastic case 
with known null parameters. This motivates a careful study of estimation of the 
proportion in the current setting. 

We begin by first assuming that the null parameters are known. In this case 
the approach of Jin [13] can be extended to the heteroscedastic setting here. Fix 
7 G (0, |). The following estimator is proposed in [13] for the homoscedastic case: 

inil) = e-nil-jXi,. . . ,Xr„n) = sup {1 - f]„(t;Xi, . . . ,X„,n)}, (3.1) 

{0<i<V27logn} 

where Xi, . . . , X„, n) = /__\(1 - |e|) (Re(^„(t; Xi, . . . , X„, n)e-^«*+-o*V2))^^. 

This estimator continues to be consistent for the current heteroscedastic case. Set 

log log n 1 
e„(7;g, A,/io, o-o,eo) = G A„(g, A; /io, ctq, eo), A„ > — , e„(/i,o-) > n^^s}, 

log 72 

where A„ = A„(/i, a) = mm^j: (^.,<^^.)^(^o,^o)}{max{|;Uj - no\^, |a| - o-qI}}- 

Theorem 3.1 For any 7 G (0, 1/2), q > I, and A > 0, except for an event with 
algebraically small probability, lim„^oo ( sup|e„(^;g,A,/.o,<7o,.o)}{l5Sy ~ ^D) = 0- 

Roughly speaking, the estimator is consistent if the proportion is asymptotically 
larger than 1/^/n. The case where the proportion is asymptotically smaller than 
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1 / \fn is very challenging, and usually it is very hard to construct consistent estimates 
without a more specific model; see [1[ [6] for more discussion. 

We now turn to the case where the null parameters (yUo, o"o) are unknown. A 
natural approach is to first use the proposed procedures in Section 12.11 to obtain 
estimates for and (Tq, say /to and o"o, and then plug them into (13. ip for estimation of 
the proportion. This yields the estimate e* (7; /to, o"o) = e* (7; /to, ao, Xi, . . . , X„, n). 
Theorem 13.21 below describes how ((3"o,/to) affects the estimation accuracy of e*. 

Theorem 3.2 Fix eo G (0,1/2), 7 G (0,1/2), q > \, and A > ^. As n ^ 00, 
suppose that except for an event Bn with algebraically small probability, max{|/io — 
/iop, Ictq — ctqI} = o(jj^). r/ien i/iere are a constant C = C{-y,q, A, ^q, ao,eo) > 
and an event D„ with algebraically small probability, such that over fl D'^ 

|e;(7;/io,<5-o)-e„(7)| < C- [log"^/^(?2)-n'^"^/^+logn- Io-q-ctoI + ^/logn- |/io-/io|] • 

Results in previous sections show that, under mild conditions, the estimation 
errors of (/to,(3"o) are algebraically small, and so is e*(7) — en(7)- In the non-sparse 
case, such differences are neghgible and both e„(7) and e*(7) are consistent. The 
sparse case, especially when the proportion is algebraically small, is more subtle. In 
this case a more specific model is often needed. See Cai et al. [1]. 

We now compare our procedure with those in Meinshausen and Rice [17] and 
in Cai et al. |4]. We begin by introducing the aforementioned purity condition. 
If we model the p-values of the test statistics as samples from a mixing density, 
(1 — e)f/(0, 1) +e/i, where ?7(0, 1) and h are the marginal densities of the p- values for 
the null effects and non-null effects respectively. The purity condition is defines as 
essinf {o<p<i}/i(p) = 0. Meinshausen and Rice propose a confidence lower bound 
for e that is valid for all h. Despite this advantage, the lower bound is generally 
conservative and inconsistent. In fact, the purity condition is necessary for the lower 



16 



bound to be consistent. Similar results can be found in Genovese and Wasserman 
[To] . Unfortunately, the purity condition generally does not hold in our settings. 

Lemma 3.1 Let the test statistics Xj be given as in li2.2(J\) . If the marginal distribu- 
tion F{fi, a) satisfies either Ppicy > 1} 7^ or Pf{o" = 1} = 1; but Pf{i^ > 0} 7^ 
and PpifJ' < 0} 7^ 0, then the purity condition does not hold. 

Cai et al. [4J consider a very sparse setting for a two-point Gaussian mixture 
model where the proportion is modeled as with j3 G Their estimator 

is consistent whenever consistent estimation is possible, and it attains the optimal 
rate of convergence. In a sense, their approach complements our method: the former 
deals with a very sparse but more specific model, and the latter deals with a more 
general model where the level of sparsity is much lower. 

4 Simulation experiments 

We now turn to the numerical performance of our estimators of the null parameters. 
The goal for the simulation study is three-fold: to investigate how different choices 
of 7 affect the estimation errors, to compare the performance of our approach with 
that in Efron [7], and to investigate the performance of the proposed approach for 
dependent data. We leave the study for real data to Section [51 

We first investigate the effect of 7 on the estimation errors. Set ctq = I/V2 and 
/io = —1/2 throughout this section. We take n = 10000, e = 0.1, and a = 0.75, 1.00, 
1.25, and 1.50 for the following simulation experiment: 

Step 1. {Main Step). For each a, first generate ne pairs of {fij,aj) with fij from 
A^(0, 1) and aj from the uniform distribution U{a, a + 0.5), and then generate 
a sample from N{fij, ctj) for each pair of {^j, aj). These ne samples represent 
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the non-null effects. In addition, generate ■ (1 — e) samples from N{fio,aQ) 
to represent the null effects. 

Step 2. For the samples obtained in Step 1, implement (7(7) = croifn'jtnil)) and 
^0(7) = Aio(v5n; 4(7)) for each 7 = 0.01, 0.02, . . . , 0.5. 

Step 3. Repeat Steps 1 and 2 for 100 independent cycles. 




Figure 3: x-axis: 7. y-axis: mean squared error (MSE). Top row: MSE for o"o(7) 
(left) and /io(7) (right). The four different curves (solid, dashed, dot, and circle) 
correspond to a = 0.75, 1.00, 1.25, and 1.50. Bottom row: zoom in. 

The results, reported in Figure [3l suggest that the best choice of 7 for both 0-0(7) 
and ftoi'j) are in the range (0.1, 0.15). With 7 in this range, the performance of the 
estimators is not very sensitive to different choices of 7, and both estimators are 
accurate. Taking 7 = 0.1, for example, the mean squared errors for (70(7) and /io(7) 
are of magnitude 10~^ and 10~^, respectively. These suggest the use of the following 
estimators for simplicity, where we take 7 = 0.1: 

(Tg = o-o(v3„; 4(0.1)), fl*Q = noi'^n, 4(0.1)). (4.1) 
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We now compare [a^, fi^) with the estimators in Efron [7J. Recall that one major 
difference between the two approaches is that Efron's estimators are not consistent 
for the non-sparse case, while ours are. It is thus of interest to make comparisons at 
different levels of sparsity. To do so, we set a at 1, and let e take four different values, 
0.05, 0.10, 0.15, and 0.20, to represent different levels of sparsity. For each e, we first 
generate samples according to the main step in the aforementioned experiment, then 
implement (ctq, /xq) and the estimators of Efron [7|, and finally repeat the experiment 
for 100 independent cycles. The results are reported in Figures S] - El 




Figure 4: Histograms for the estimation errors of Efron's estimator for (top row) 
and o"o (bottom row). From left to right: e = 0.05, 0.10, 0.15, and 0.20. 



W W W iLjL 

-O.I O 0.1-0.1 O 0.1-0.1 O 0.1 -0.1 O 0.1 

[C ffl ffl i 



-O.I O 0.1-0.1 O 0.1-0.1 O 0.1 -0.1 O 0.1 



Figure 5: Histograms for the estimation errors of Efron's estimator for /iq (top row) 
and /tg (bottom row). From left to right: e = 0.05, 0.10, 0.15, and 0.20. 

The results show that our estimator of ctq is more accurate than that of Efron [7] , 
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and the difference becomes more prominent as e increases. In fact, when e ranges 
between 0.05 and 0.2, the estimation errors of CTq are of the order 10~^, while those 
of Efron's estimator could get as large as the order 10^^. On the other hand, the 
two estimators of fiQ are almost equally accurate, and the estimation errors for both 
approaches fluctuate around 0.02 across different choices of e. 

However, the above comparison is only for moderately large n. With a much 
larger n, the previous theory (Theorem 12. 2p predicts that the estimation errors of 
(^O'Ao) ^ill become substantially smaller as ((TQ,/iQ) is consistent for (cro,/io)- In 
comparison, the errors of Efron's estimators will not become substantially smaller 
as the estimators are not consistent. To illustrate this point, we carry out a small 
scale simulation experiment. We take e = 0.1 and a = 1 as before, while we let 
n = 10*^, 4 X 10*^, 1.6 X 10^, and 6.4 x 10^. For each n, we generate samples according 
to the main step, calculate the mean squared errors (MSE), and repeat the process 
for 30 independent cycles. The results are reported in Table [H and they support 
the asymptotic analysis. 



n 




10^ 


4 X 10^ 


1.6 X 10^ 


6.4 X 10^ 


MSE for do 


Efron's approach 


9.100 


8.564 


8.415 


8.567 


Our approach 


0.816 


0.276 


0.047 


0.031 


MSE for fio 


Efron's approach 


8.916 


5.905 


3.957 


3.617 


Our approach 


5.807 


3.019 


1.106 


0.538 



Table 1: Mean squared errors (MSE) for various values of n. The corresponding 
MSE equals the value in each cell times 10"*^. 

Finally, we investigate the performance of the proposed procedures for dependent 
data. Fix n = 10^, e = 0.1, and a = 1, and let L range from to 250 with an 
increment of 5. For each L, generate n + L samples Wi, W2, ■ ■ ■ , Wn+i from A^(0, 1) 
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and let Zj = {Y^^zj'^^ Wk) / V-^ + 1, so that {zj}^^-^ are block-wise dependent (block 
size equal to L + 1) and the marginal distribution of each zj is A^(0, 1). At the 
same time, generate the mean vector n and the vector of standard deviations a 
according to the main step, let Xj = fij + aj ■ Zj, and implement (/Iq, ctq) to {Xj}^^^. 
We then repeat the process for 100 independent cycles. The results are reported 
in Figure [6], which suggests that the estimation errors increase as the range of 
dependency increases. However, when L < 100, for example, the estimation errors 
are still relatively small, especially those for a^. This suggests that the procedures 
are relatively robust to short range dependency. 




Figure 6: a;-axis: L. ?/-axis: root mean squared error for /xq (dashed) and CTq (solid). 



5 Applications to microarray analysis 

We now apply the proposed procedures to the analysis of the breast cancer and HIV 
microarray data sets that were analyzed in Efron [7j. The R code for our procedures 



is available on the web at http://www.stat.purdue.edu/' jinj/Research/software. The 



2;-scores for both data sets can be downloaded from this site as well; they were kindly 
provided by Bradley Efron. The R code for Efron's procedures and related soft- 
ware can be downloaded from http:/ / cran. rproject. org/src/ contrib/ Descriptions^ 
locfdr.html. For reasons of space, we focus on the breast cancer data and only 
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comment briefly on the HIV data. 

The breast cancer data was based on 15 patients diagnosed with breast cancer, 7 
with the BRCAl mutation and 8 with the BRCA2 mutation. Each patient's tumor 
was analyzed on a separate microarray, and the microarrays reported on the same 
set of = 3226 genes. For the j-th gene, the two-sample t-test comparing the seven 
BRCAl responses with the eight BRCA2 was computed. The t-score Uj was first 
converted to the value by Pj = Fi^d/j), and was then converted to the z-scale [7], 
Xj = '^^^{pj) = $^^(^13(7/^)), where $ and F13 are the survival functions of A^(0, 1) 
and t-distribution with 13 degrees of freedom, respectively. 

We model Xj as N{fij, cr|) variables with weakly dependent structure, and for a 
pair of unknown parameters (/io, ctq), (/ij, crj) = (/io, <Jo) if and only if the j-th gene 
is not differentially expressed. Since Xj is transformed from the t-score which has 
been standardized by the corresponding standard error, it is reasonable to assume 
that the null effects are homogeneous, and that all effects are homoscedastic; see for 
example, [5|,[7]. The normality assumption is also reasonable here, as the marginal 
density of non-null effects can generally be well approximated by Gaussian mixtures; 
see O Page 99]. Particularly, it is well known that the set of all Gaussian mixing 
densities is dense in the set of all density functions under the £^-metric. 

We now proceed with the data analysis. The analysis includes three parts: 
estimating the null parameters (ctq, f^o), estimating the proportion of non-null effects, 
and implementing the local FDR approach proposed by Efron et al. [8]. 

The first part is estimating ((To,/io)- We apply ((Jo,/io) (defined in (14.11) ) as well 
as the estimators used by Efron [7] to the z-scores. For the breast cancer data, 
our procedure yields ((TQ,/iQ) = (1.5277,-0.0525), while Efron's estimators give 
(ao,/io) = (1.616,-0.082). 

The second part of the analysis is estimating the proportion of non-null effects. 
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Our estimator 


Local FDR 


MR 


CJL 


Our Estimated Null 


0.0040 


0.0128 


0.0033 





Efron's Estimated Null 








0.0098 






Table 2: Estimated proportion of non-null effects for the breast cancer data. 

We implement our procedure as well as Meinshausen and Rice's [17J approach and 
the approach of Cai et al. [4] (which we denote by MR and CJL respectively for 
short), to the 2;-scores of the breast cancer data. The bounding function a* for 
MR estimator is set as 1.25 x ^/2Tog\ogn/ ^/n, and the a„ for CJL estimator is set 
as y/2 log \ogn/ y/n; see P] for details. Using the estimated null parameters either 
obtained by Efron's approach or obtained by our approach, we apply each of these 
procedures to the 2;-scores. In addition, the local FDR approach also provides an 
estimate for the proportion automatically. The results are reported in Table El 

In the last part of the analysis we implement the local FDR thresholding pro- 
cedure proposed in [8j with the 2;-scores of the breast cancer data. For any given 
FDR-control parameter q G (0,1), the procedure calculates a score for each data 
point and determines a threshold tg at the same time. A hypothesis is rejected if the 
score exceeds the threshold and is accepted otherwise. If we call a rejected hypoth- 
esis a "discovery," then the local FDR thresholding procedure controls the expected 

false discovery rate at level q, ^r #False Discoveries i < ^ details. 

^ ^#iotal Discoveries-' ~ ^ 

With Efron's estimated null parameters, for any fixed q G (0, 1), the local FDR 
procedures report no rejections for the breast cancer data set. Also, three different 
estimators for the proportion report 0. These suggest that either the proportion of 
signals (differentially expressed genes) is small and/or the signal is very weak. 

In contrast, with our estimated null parameters, the estimated proportions are 
small but nonzero. Furthermore, the local FDR procedures report rejections when 
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q > 0.91. For example, the number of total discoveries equal to 167 when q = 0.92, 
and equal to 496 when q = 0.94. Take q = 0.94, for example, since for any q G (0, 1), 
the number of true discoveries approximately equal to {1 — q) times the number of to- 
tal discoveries [7], this suggests a total of 30 true discoveries. The result is consistent 
with biological discoveries. Among the 496 genes which are identified to be differ- 
entially expressed by the local FDR procedures, 17 of them have been discovered 
in the study by Hedenfalk et al. [TTj. The corresponding Unigene cluster IDs are: 
Hs.182278, Hs.82916, Hs.l79661, Hs.ll9222, Hs.l0247, Hs.469, Hs.78996, Hs.ll951, 
Hs.79078, Hs.9908, Hs.5085, Hs.l71271, Hs.79070, Hs.78934, Hs.469, Hs.l97345, 
Hs. 73798. We also identified several genes whose functions are associated with the 
cell cycle, including PCNA, CCNA2, and CKS2. These genes are found to be signif- 
icant by Storey et al. [19]. The results indicate that our estimated null parameters 
lead to reliable identification of differentially expressed genes. 

Similarly, for the HIV data, our estimators give ((TQ,/iQ) = (0.7709,-0.0806), 
while Efron's method gives ((To,/io) = (0.738,-0.082). With q = 0.05, the local 
FDR procedures report 59 total discoveries with our estimated null parameters, and 
80 with Efron's estimated null parameters; the latter yields slightly more signals. 



6 Proofs of the main results 

We now prove Theorems 12.11 12. 2^ and 13.11 The proof of Theorem 13.21 is similar to 
those of Theorems 12.21 and 13.11 and so is omitted. As the proofs for the estimators 
of (Jq and /io are similar, we focus on ctq. We first collect a few technical results and 
outline the basic ideas. The proofs of these preparatory lemmas are given in 



Lemma 6.1 Let (Tg(-; ■) and yUo(s ■) be defined as in 1^2. 7| j. Fix t > 0. For any 
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diff'erentiable complex-valued functions f and g satisfying \ f{t)\ 7^ and \g{t) \ 7^ 0, 
Wl{f,t)-al{g,t)\<^^^[{2t.\al^^^^^ 

where r«(t) = ^ ■ [t ■ Wi{g,t)\ ■ \f{t) - g{t)\' + \f{t) - g{t)\ ■ \f'{t) - g'{t)\] and 
^"^W = \m ■ 0/^o(^?,t)| ■ \f{t)-g{t)\' + \f{t)~g{t)\ . \f{t)-g'{t)\]. 

Heuristically, |/|v?n(t„)P ~ n"/, (^H^Jn) ~ c^O' Iv5'(4)|/|v^(4)| ~ (^oL, and 

\Min) - v{in)\ < Op{^/k^/V^), - < Op{^/k^/V^). (6.1) 

Applying Lemma [6TT] with / = g = 'f, and t = UX'j), we have 

~ n^(3aoVn(i(7)) - ^(4(7))l + j^MUl)) ' ^'(4(7))l) ~ 0(n^-5 v^^), 
and Theorem 12.11 follows. We now study (16. ip in detail. 

Lemma 6.2 Set PVo(v'n;^) = Wo{ipn]n,Xi, . . . ,Xn) = supo<t<iog„ |v5n(^) - V5(^)|- 
Fix qi > 3. Let A„(g, A; /xq, o"o, eo) g'ifen as in Theorem \2.1\ When n 00, 

sup P{Wo{<^n;n) > ^2qi logn/y/^} < 41og^(n) ■ n""'/^ ■ (1 + o(l)). 

{(Ai,o-)eA„ {q,A;fio,(To,eo)} 

Lemma 16.21 implies that except for an event with algebraically small probability, 



\(p{in) — ^{tn)\ < Wo{(pn',n) < \j2q\ logn/ ^Jn. This naturally leads to a precise 
description of the stochastic behavior of |tn(7) ~''^n(7)| given in the following lemma. 

Lemma 6.3 Let gi > and let A„(g, A; /ig, (Tq, eo), ^n(7); o^c? t„(7) he given as in 
Theorem \2.1[ When n ^ 00, 

1 /gT 



sup 



{\ini7)-tni7)H{Wo{vu-,n)<VWW^/V^}} ^ ~ \ ^^^(l+o(l))- 



{{ti,a)&A„(q,A;fio,ao,€o)} "0 V ' 
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We now study \(p'^{in) — ip'{i„)\. Pick a constant ttq > ^ a/^i /7 and set 
W'i(v9„,7,7ro;n) = Vri(v9„,7,7ro;n,Xi, . . . ,X„) = sup \ip'^{t) - ip'{t)\. 

By Lemma 16.31 except for an event with algebraically small probability, |tn(7) — 
tnil)] < TTo ■ n'^"^/^ and consequently |v9'„(t„(7)) - (y9'(4(7))| < VTi 7, ttq; n). 
The following lemma describes the tail behavior of Wi. 

Lemma 6.4 Fix 7 G (0,1/2), ttq > ^v^Wt «^^^ set = ^EJ=i^[^|]- ^'^ere 
exist constants Ci and C2 > such that for any {fi, a) G A„(g, A; /io, (Tq, eo), s„ < Ci, 

pru/ f \ ^ - 2)logn + 2s„ ( ) 

Jn 



where Ci{q, 7) zs as in Theorem \2.1[ As a result, except for an event with algebraically 



small probability, \^'n{tn{l)) -^'{tn{'^))\ < W^i(v9n, 7, vr; n) < 0{^/\ogn/^/n). 

We have now elaborated the inequalities in (16.11) . The only missing piece is the 
following lemma, which gives the basic properties of aQ{ip;t) and ^o{ip;t). 

Lemma 6.5 Fix q > 3 and A > 0, with il){t) and r„ as defined in ^2.13i) and 1^2. 19\} 
respectively, write ipit) = e„(y'(t) and r(t) = j^rit). For all {fio,(TQ,eo)- eligible 
(/X, a) and all t > 0, there is a constant C > such that 

WIM - all < . < C|^'(t)|/t, (6.2) 

|/.o(^,t) -/iol < \r'{t)\ . < C\^'{t)\. (6.3) 

|1 + r{t)\^ 

Additionally, uniformly for all (/i, a) G A„(g, A; fiQ, o"o, cq) and all t > 0, 

(al). \g{t)\ < e-^ < I, \g\t)\ < A, \g"{t)\ < C{l + A^), \g"\t)\ < C{l + A^), and 
\g\t)\ < Ae-"^ + min{A2te-^, 

(a2). consequently, \ip'{t)\/\Lp{t) \ = cTq • t ■ (1 + o{l)); 
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(aS). the second derivative of aQ{ip\t) is uniformly hounded, and aQ{ip\t) —>■ cXq, 
J^(Jo(</5; t) as t ^ oo. 

Similarly, both fiQ{ip]t) and its first two derivatives are uniformly bounded for all 
t> 0, and j^fj.o{ip;t) if fio{ip;t) /iq. 

We now prove Theorem 12.11 12. 2[ and 13.11 
Proof of Theorem I2.lt Since the arguments are similar, we prove the first claim 
only. Write t„ = tn(7), = tn{l), and Wi{ipn', n) = Wi{ipn, 7, ttq; n). Pick constants 
qi and ttq such that 1 < gi/max{3, (g — 1 — 27)} < 2 and ttq > ^a/^iTt- Introduce 
events 

Bo = {Woi^n, n) < v/2^ri^}, B, = {W,{^^- n) < WgEIp^KlK}. 

Note that the choice of qi satisfies Ci(g,7) < gi/3 and C2(c'"o,g, 7) > (y'^\/2q[, where 
Ci(g, 7) and C2(o"o, 5,7) are defined as in Theorem 12.11 Use Lemma [6.21 and Lemma 
Ea P{B^o} < o(n-«i/3) g^j^^ P{Bl} < o(n-^i(5'^)); since Ci(g,7) < gi/3, P{B^U 
Bf} < o(n^'^i*^''''^)). We now focus on Bq fl Bi. By triangle inequality, \aQ{ipn',in) — 
cro(v5;^n)l < Wl{'^n]h)-crl{'^]in)\ + \al{'^]in)-crl{'^]tn)\. Note that by the choice of 
TTo and Lemma [6. 3[ |t„ — ^nl ^ ttq ■ n'''^^/^ for sufficiently large n, it thus follows from 
Lemma[63]that \al{ip]in) - al{ip]tn)\ ~o(|4-t„|) = o{rCi~''/'^)] lecalX 02(0-0, g, 7) > 
(ToA/2gi, so to show the claim, it suffices to show that as n — > cxd, 

\<yl{^nX)-<yl{^-X)\ < ^<jl-^2qi log -(1 + 0(1)), over Bq^Bl (6.4) 

We now show (16. 4p . Over the event i?o H recall |t„ — tn| < ^ron'^"-^/^, so 
by fl2.13p . tn ~ in ~ V27logn/(To; by Lemma [675| this implies crQ((/9,t„) ~ cTq and 
|V5'(t„)|/|V5(tn)| ~ crlin ~ (Totn. Moreover, since \(pn{in) - V5(in)| < V2gi logn/^/n, 
it follows that |v5(tn)|/(^n|v^n(^n)n ~ (1/^n)'^'''- Lastly, by Lemma WM. \Lp[^{in) — 
V^'(^n)| < 0(-\/logn/ v^)- Combining these, (16. 4p follows directly by applying 
Lemma [6. II with / = g = (p, and t = tn- □ 
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Proof of Theorem 12. 2t Note that, by triangle inequality, \al{(pn]in{'y)) — ctqI < 
Woi^n] in{l) - crl{(p] tn(7))l + ^nil)) " ctqI- Theorem O now follows directly 

from Theorem 12.11 and Lemma 16. 5[ □ 

Proof of Theorem I3.lt Without loss of generality, set /io = and o"o = 1- Write 



tn = V27 log n, e„ = e„(yU, a), (pn{t) = (pn{t; Xi, . . . , Xn,n), ip{t) = ip{t; iJ.,a,n), 
fi„(t) = fi„(t;Xi, . . . ,X„,n), and e„ = 0„(7; g, A, /iq, ao, eo). Set fi(t) = E[n„{t)], 
= sup|o<,<t}{l - ^nis)}, and ^*{t) = sup|o<,<t}{l - n{s)}. Note that it 
is sufficient to show that when n oo, (a) except for an event with algebraically 
small probability, sup^^^^jge^^ |^;(t„) - ^*(t„)| < 0{log-^/\n) ■ n^~^/^), and (b) 

suP{(^,a)Ge„}l^^-l| = o(l). 

We ffist show (a). By symmetry, |\l'*(i(:„) — does not exceed 

sup \Qn{t)-^{t)\<2 [ sup |Re((^„(t)) -Re(^(t))|rfe (6.5) 

0<t<t„ Jo 0<t<t„ 

Moreover, similar to the proof of Lemma 7.2 in [13], we have that for fixed q > 3/2, 



sup|(^^^)ge„} sup|o<j<j„} \Re{ipn{t)) -Re{ip{t))\ < 0{^\ogn/ y/n) except for an event 
with probability ~ 2 log^(r;,) ■ n~^'^^^. Elementary calculus yields I^E'* — '^*{tn)\ < 
0(v/I^/v^) ■ J^{1 - Oe(^'°s")-«'rf^ = 0(log"^/^(n) ■ n^~^/^), and (a) follows. 

We now show (b). Let / be the Fourier transform of / and let be the 

density function of A^(0, 5]{t)) with 5j{t) = t{(x] - 1)^/^ Set p{x) = 2(1 -cos(x))/x2 

for X 7^ and p(0) = 1. Elementary calculus shows that (p5j{t){0 = 6xp( 1 ) 

and p(^) = max{l — |^|,0}. So by the Fourier Inversion Theorem fl6[ Page 22], 



1 r (1 — a )t E 

n J -I 2 

where * is the usual convolution. Since * pitPj) = 1 when (/i^, (Xj) = (0, 1), 



1 - n{t) = en ■ Ave{j; (^^.,^^)^(o,i)}{l - </>5,(t) * p{tpj)}. (6.6) 
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Note that (pa„*p{bn) for any sequences {a„}^]^ and {&n}^i satisfying max{a„, 6„} 
oo, so by (16.61) and the definition of 0^, sup|(^ ,^)ge^} | ^ -'-I ~ '^i^)- Note that 

< <lforallt, so by ([61]) and the definition of n(t„) <^*{tn) < e„; 

as a result, 1 ^"'^*^*"^ - l| < li^Mii _ il and (b) follows directly. □ 
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7 Appendix 

7.1 Proof of Theorem [2751 

For short, write t„ = t„(7) and t„ = tn(7)- The following two lemmas are proved in 
Section 17.1.11 and Section 17.1.21 respectively. 

Lemma 7.1 Witha{-) and An{a, B, q, A) as in Theorem \2.'d Fixr E {1.5,d—{2d+ 
2.5)7). As n ^ 00, uniformly for all {fi,cr) G An{a,B,q,A), except for an event 
with a probability 0/ o(n"^''/^), sup|o<t<iog„} \ fn{t) — ^{t) \ = o(n~('^~'')/(^'^+^-^)). 
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Lemma 7.2 Witha{-) and An{a, B, q, A) as in Theorem \2.3[ Fix'j E (0, 2^+2^5 ) ^'^'^ 
an integer k > 0. As n —>■ 00, for all {ji, a) G A„(a, B, q, A), sup|Q<^<iog„|{|(y9i^^(t) — 
^^'Km < 0,{\og''\n)), and - = 0,{\og^^^'/'^\n)/^). 

To show the theorem, it is sufficient to show that 

\Min) - ^{L)\ = Op{l/V^), l^'Jn) - = Op(log'^+i/2(n)/v^). (7.1) 

In fact, by triangle inequahty, 

Woi'^n^D -al{(^;tn)\ < |(To(v9„;4) -aQ{ip;i„)\ + |o-o(v9;4) - a^{ip;tn)\. (7.2) 

Once (17.11) is proved, by similar arguments as in the proof of Theorem 16. 3[ 

\in-tn\ = Opin^-'/'), (7.3) 

it thus follows from Lemma [6.51 that 

\al{ip;L)-cTliip;tn)\ = Opi\in-tn\) = Op{n^^^/^). (7.4) 

At the same time, by (17. 3p and Lemma 16751 except for an event with asymptotically 
vanishing probability, |v2n(4)|/|v'n(i"n)P ~ n^, <yl{^]in) ~ and |v?'(t„)|/|v2(4)| ~ 
crlin] applying Lemma [6TT] with / = g = 'f, and t = t„, it follows that 

W^oi^n, in) - a^oi^; 4)1 = Op{n^-'/^). (7.5) 

The theorem follows directly by inserting ( \7A\\ and (17. 5p into (17.21) . 

We now show (17. ip . Since the proofs are similar, we only show the ffist equality. 
Applying Lemma [7.11 with r = (1.5 + d — {2d + 2.5)7)/2, it follows that there is an 
event An such that P{A'^} is algebraically small and 

sup \(Pn{t) - <^{t)\ < d{n'^'^^^^^^), over (7.6) 

{0<t<logn} 
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By similar arguments as in the proof of Lemma 16.31 it follows that 

14(7) - tnil)\ < o(n5-(^-m)), over (7.7) 

notice the exponent is negative. Now, let i be the smallest integer satisfying {i + 
^) ■ 1^ ~ 2^+2^5 1 ^ Taylor expansion, for some ^ falling between t„ and t„. 



fc=0 



Notice that by the choice of i and (17. 7p . (t„ — tn)^^"*^ = o(1/a/?2) over An, the claim 
follows directly from Lemma I7.2[ □ 

7.1.1 Proof of Lemma [77T] 

Applying P Theorem 1.3] with b = 2, q = n^d-r)/(d+i.25) ^ ^ ^ y/32Fhgn/^ 
gives P{|Re((^„(t) - (^(t))| > e} < d{n-^) and P{|Im((^„(t) - ^{t))\ > e} < o(n-0, 
it thus follows 

P{\iPn{t) - m\ > V2e} < o(n-^). (7.8) 

The remaining part of the proof is similar to that of Lemma 16.21 so we keep it 
brief. Fix 6 G (1/2, oo), with the same grid and similar arguments as in Lemma [621 
it follows that 

P{ sup \Mtk)-^{t)\>{V2€ + ^)}<I + II, (7.9) 

{0<i<logn} 

where / = P{sup|i<fc<„.-i„g„} |^„(tfc) -^(4)] > V2e} and 11 < P{n-^ snpt{\!f'^{t) - 
V^'(^)|} > :^}- The key for the proof is to show that 

1 

Var(-^ |Xj|) < Clog2'^(n)/n. (7.10) 

In fact, once (17.101) is proved, then on one hand, by (17.81) . / < 'ri''log(n) ■ d{n~^) = 
d{n^^'^^^^). On the other hand, by similar arguments as in the proof of Lemma [6.21 

n n 
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where s„ = ^Yl^=i — C^og"'{n) as maXj{|yUj| + \(rj\} < log"(r;,). The claims 

follows by taking 6 = r/3. 

We now show (17.101) . Applying [3l Corollary 1.1] with p=1.5, g = r = 6, 

Var(^E|X,|) = ^5^Cov(|X,i|X,|)<^5^a^/=^(|j-fc|)||X,||6||^ 

i=l j,k j,k 

By ma.Xj{\fij\ + \<yj\} < log"(n), ||Xj||6 < Clog"(n) for all I < j < n; since 
a{k) < Bk-'^ with d > 1.5, ^7A0\f follows by observing Ej.fc"^^^(li - ^1) < 
Cn EZi «'^'('^) < Cn k-^''''' = 0{n). □ 

7.1.2 Proof of Lemma [73] 

Consider the first claim. By direct calculations, 

j=l j=l j=l 

where the right hand side does not depend on t. Since max{j}{|yUj| + \o-j\} < 

B log°(n), the claim follows directly from E\Xj\^ < C ■ +\aj\'') < C ■ log^'^X^), 

V 1 < j < n, where C = C(A;) is a generic constant. 

Consider the second claim. Introduce an event Dn = {maxj{\Xj\} < 3B log"'*'^^^(n 

By max{j}{|/ij| + |crj|} < -Blog'*(n) and direct calculations, 

P{D^J < ^P{|Xj| > 351og"+^/2(n)} < 2n<l(3v/k^- 1) = d{n-^), (7.11) 
j 

where $ is the survival function of A^(0, 1). To show the claim, it suffices to show 
EM'\tr.) - V^'\tn)) ■ l{D,.}f = Oi\og^"^^'^'in)/n). (7.12) 

Now, first, observe that |x|^exp(— ^^^^j*^ ) = o(l), where o(l) ^ as n ^ oo, 
uniformly for all |x| > 35 log"^^^^(n) and {fij,aj) satisfying + |crj| < i?log"(n); 
combining this with dZH]) gives |^(<^„(t„) ■ 1{ds})I < ^EJ=i^(l^jf " = 
d{n^^). Notice that E(pn\tn) = (p^^\tn), we thus have 

E[{^V{tn) - V^'\tn)) ■ 1{A.}] = -EW^n\tn) " 1{D,}] = o{l/n). (7.13) 

34 



Second, as max{j}{|Xj |} < 3B\og"''^^^^{n) over Dn, by Billingsley's inequality [31 
Page 22], 

Combining this with fl7.13p gives (17.121) . □ 



7.2 Proof of Lemma 16.11 

For short, we drop t from the functions whenever there is no confusion. For the first 
claim, by direct calculations, we have: 

where / = (1 - ^) ■ ol{g, t), II = ■ [Re{g') ■ Re(/ - g) + lm{g') ■ Im(/ - g) + 
Re{g) ■ Re((/ - ^7)') + Im((?) ■ Im((/ - g)% and /// = ^ ■ [Re(/ - g) ■ Re((/ - 
g)') + Im(/ — g) ■ Im((/ — g)')]. Now, firstly, using triangle inequality. 



< 



■ ii/p - i^^n < ^^(2i^/i -11-91+ \f-9n; 



secondly, using Cauchy-Schwartz inequality, |Re(^)Re(w) + Im(2;)Im(w)| < \z\ ■ \w\ 
for any complex numbers z and w, so it follows that 

<j^,-[\9'\-\f~9\ + \9\ ■ \{f - 9)% < ^ ■ 1/ - ■ l(/ - 9y\; 

combining these gives 

\ali9, t) - alif, t)\ < ^ [{2t • \al{g, t)\ ■ \g\ + \g'\)\f - g\ + \g\ ■ \{f - g)'] + f«] , 
where rn^ = t ■ \(Jo{g, t) \ ■ \f — g\'^ + \f — g\ ■ \ {f — g)'], and the claim follows directly. 
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For the second claim, by direct calculations: 

Re(/')Im(/) - Re(/)Im(/') Re(^?')Im(^7) - Re{g)lmig') 



/io(fi',t) - Aio(/, t) 



1 + 11 + III, 



where / = (1 - ^) ■ fio{g, t), // = ^ ■ [(Re(^?') ■ Im(/ - g) - lm{g') ■ Re(/ - g)) + 
{lm{g) ■ Re((/ - g)') - Re{g) ■ Im((/ - g)'))], and /// = ^[Re((/ - g)') ■ Im(/ - 
g) — Re(/ — g) ■ Ini((/ — g)')]. As in the first part, 

< ^ • lls'l • 1/ - Jl + l9l • l(/ - 2 iTp ■ !(/ - sY\ ■ 1/ - 9l; 

combining these gives 

M,t) - Mt)\ < ^ ■ [{2M9,t)\ ■ \g\ + \g'\) ■\f-g\ + \g\ ■ \{f - g)'] + r^^], 
where Vn^ = |/io(fi',t)| ■ \ f — fi'P + 1/ — S'l • |(/ — and the claim follows. □ 



7.3 Proof of Lemma 16.21 

Lay out a grid tfc = k/n^, for k = 1, . . . , logn and 6 G (l/2,gi/2). For any 
< t < logn, pick the closest grid point t^, so that — 1\ < and 

where the second term is < ■ sup^ |v5'„(t) — V^'(i)|- Write: 



V2gi logn \ ^ \ ( \ 



n 



where \l{qi,n) = V2gi log«-21oglogn/V29i logn A2(gi,n) = 21°glogn^2gilog n_ ^j^^^ 

follows that 

P{ sup |y,„(t)-y,(t)|>^^^?^^I^} </ + //, (7.14) 

0<t<logr!, 
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where / = P{supi<fc<„i iog„ \(Pn{tk)-^{tk)\ > Xi{qi,n)}, and// = P{n ^-supt |v2'„(t)- 
ip'it)\>X2iqun)}. 

For I, a direct generalization of Hoeffding's inequality ^2] to complex-valued 
random variables gives: 

/< (n^logn)4e-3"^?(5i'«) =4n^logn-e"'^+'°^'°^"^'~^^^ (7.15) 
< (4n^logn)(n-^^/2logn) = An^-'^'^Hog^ n. (7.16) 

For II, direct calculations show that sup^ |v^'„(^) — V''(^)| ^ ^ ■ Yl]=ii\-^j\ + -^l^jD- 
Denote s„ = ji^^=iE\Xj\ for short, it follows from Chebyshev's inequality that: 

1 " 

// < P{- J2^\Xj\ + E\X,\) > n' ■ A2(g, n)} (7.17) 



n 



= P{- Ed^.l - ^l^.l) ^ ■ ^^(g, n) - 2.„} = o(^~2^ 1^S (l<;gH) ), (7.18) 
n ^ log(r;,) 

where we have used the fact that Sn is uniformly bounded from above by a constant 
C(g, A, /io, (To) < oo. Inserting - fmsD to (17^1^]) and taking 5 = gi/6 give: 



P{ sup |y^„(t) -y.(t)| > = 41og^(n) -n'^^/^- (1 + 0(1)), gi > 3. 

n<t<lnP'r), V'^ 



0<t<logn 

This concludes the proof of Lemma 16.21 □ 



7.4 Proof of Lemma 16.3 



For short, write t„ = t„(7), t„ = t„(7), V5n(t) = v?„(t; Xi, . . . , X„, ra), v2(t) = 
ip(t; fi,a,n), and A„ = A„(g, A; /iq, ctq, eo). We claim that for sufficiently large n, 
|v3(t)| is monotonely decreasing in t over [loglogri, oo). In fact, using Lemma [6.5[ 
when n — i> oo, inf|(>iogiogn}{c"o(v5; t)} = cTq ■ (1 + o(l)) > 0; recall that 

^|^(t)| = -t-|^(t)|-a2(^,t), (7.19) 

the monotonicity follows directly. 
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We now focus on the event Dn = {Wo{ipn',n) < \/2qi logn/ y/n}. Recall that 

= Iv5n(4)| = n-'^, so 

\\(p{in)\-\(pitn)\ \ = \ \(p{tn)\-\(Pniin)\ \ < l^iD - ^niDl < V^Ql logu/ y/u; (7.20) 

combining fl7.19p and fl7.20p and using Taylor expansion, there is a falling between 
tn and in such that 



\tn tr. 



mo 



< 



y/2qi log n/ ^Jn 

M^(oi-ko'(^,or 



(7.21) 



(7.22) 



At the same time, elementary calculus shows 

(1 - 2eo)e-"o*'/' < \^{t)\ < e-'^o*'/^ V t > 0. 

Combining fl7.20p and fl7.22p . it follows that > log logn for sufficiently large n. 
Since \(p(t)\ is monotone over [log logn, oo), so ( I7.20p and (I7.22P further imply that 



IviOl ^ n and ^ ^ in ^ tn ^ \/2'-f log n/ao] these, together with Lemma 16751 



imply that ~ f^o- Inserting these into ^M) gives |i - 1„| < ^'^^!°^."(^ 



□ 



7.5 Proof of Lemma 16.41 

Lay out a grid tfc = {tn{-f)-Ton^'^/'^) + ^, for 1 < A; < 2Ton^+^-^/'^ and 5 G [1/2, oo). 
For any t G [tfc,4+i], 

K{t)-v'{t)\<\^'n{h)-^'{h)\+n-'.( sup K{0-^"{0\)- (7.23) 

By direct calculations and the definition of s„, 

it thus follows that: 



n 



-2 



1 

-j:(^]-E{Xf)) + 2s. 



n 



(7.24) 



1 

[-5:(Xj-i?(X|))] + -^. (7.25) 

7 = 1 V 
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Now, denote gi = g/2 — 1 for short, write: 



- - Ai(g,n)+A2(g,n)H — ^,(7.26) 



whereA,(g„n) = (.-„v/2^n^-(,,i^ij^))/v/^ and A^lgi, n) = 
Compare (17:261) with (ESI " (ESS]) gives: 



or \ I i+\ ^ ■ (V2gilogn + 2s„) 
P{ sup |^„(t) - (t) I > ^= 1 <I + II, 



|t-tn(7)l<'rO-nT-l/2 

where / = P{supi<fc<2^(,„«+7-i/2 Iv^U^fc) - V^'l^fc)! > Ai(gi,n)}, and // = P{n-^ ■ 

For I, by [TSl Theorem 1] and direct calculations, 

/ < (27ron^+^-'/')-o(e-5"^?(5i.«)) < {2non^+^-^/^)-d{n-'^') = o(n^+^-^/'-'?^). (7.27) 

For II, we study for the case q < A and the case g > 4 separately. 

For the case g < 4, set 6 = (gi + 1 — 7)/2 > 1/2, by Chebyshev's inequality, 

1 " -2 

where we have used the fact that is uniformly bounded from above by a constant 
Ci = Ciil, A, 1^0) o"o) < oo. Notice that the choice of 5 satisfies 5 + 7 — 1/2 — gi = 
1/2-5 = (l + 7-g/2)/2, combining (17:271) and (17:281) gives / + // < o(n(^+^-«/2)/2). 

For the case g > 4, notice that ^ J2j=iEi-^j ~ ^[-^j])^ is uniformly bounded 
from above by a constant C2 = C2(g, A, /io, ctq) < 00, it follows from Chebyshev's 
inequality that 

Set 6 = max{l/2, (g — 1 — 27)/6}, combining (17.27!) and (17.291) gives: 



I + IK 



0(71^+^-9/2), 4<g<4 + 27, 

o(ri(27+i-9)/3)^ g>4 + 27. 

This finishes the proof of Lemma 16. 4[ □ 
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7.6 Proof of Lemma 16.51 

First, we show fl6.2p . Write \(p{t) \ = \ipo{t) \ ■ \1 + r{t)\, recall that aQ{ipo;t) = a^t, so 
,Mt)\ _ i\Mt)\ ■ \l + r{t)\ + \Mt)\ ■ i\l + r{t)\ _ , i\l + r{t)\ 

-Onl 



d 

dt 


\Mt)\-\ 


1 


+ rit)\ + \Mt)\ ■ i\ 


|l + r(t)| 




\Mt)\-\ 


\l + r{t)\ 



Mt)\ |^o(t)|-|l + r(t)| ° " |l + r(t)| ' 

and it follows that a^i(p,t) - ctq = + " |1 + which yields (ESD 

by direct calculations. 

Next, we show fl6.3p . For short, we drop t from all expressions whenever there is 
no confusion. Since ip = v9o(l + r), Re{(p) = Re{(po) + Re(r)Re(v9o) — Im(r)Im(y9o), 
and lm{(p) = Im((y9o) + Ini(r)Re(v9o) + R'e(r)Im((y9o); it can be showed that 

Re{^') ■ lm{^) - Im(y?') ■ Re{ip) = 1 + 11, (7.30) 

where / = — 11 + rp/iolv^oP, and // = Iv^oP ■ [— Ini(r') + Re(r')Im(r) — Im(r')Re(r)]. 
The proof of (I7.30p is long, so we leave it to the end of this section. Now, 

/ + // Im(r') - Re(r')Im(r) + Re(r)Im(r') 



Ho{^;t) = --— = Ho 



IV^I |1 + 

so by Cauchy-Schwartz inequality, 

|Im(r') — Re(r')Im(r) + Im(r')Re(r)| , 1 + |r| 

/xo ¥5, t) - /io = r—^ — ^ < r ■ J— — ^, 

|1 + |1 + 

and (16.31) follows directly. 

Next, we show (al) and (a3). (a2) follows directly from (al) and direct calcula- 
tions, so we omit it. 

We now show (al). For the 5 inequalities, the proofs for the first 4 are similar, so 
we only show the second one and the last one. First, consider the second inequality. 
Use Holder's inequality, 

Aveo, (;.,,.,)^(;.o,<xo)}{(^| - ^o)} < (7.32) 
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Note that sup|^>o} xe ^^^^ = 1/e < 1, direct calculations show that 

\g'{t)\ < Avey, (^^.,<,.)^(^o,^o)}{e ^ ■ [\f^j - /iol + ((^| - (^o)t]} (7.33) 

< Aveij. (;.„<x,)^(Mo,<xo)}{l/^i - /^ol + (c^l - fTo)^/^ ■ - ao)^/^t ■ e ^ ]} 

< Aveij. (/.,,a,)^(Mo,<xo)}{l^j - ^ol + (f^j - (^oY^^}, 

the second inequality follows directly by using (17.311) . Second, consider the last 
inequality. By the definition of r„ and fl7.3ip . it is seen 

(o-2-o-g)t2 

At the same time, notice that sup|^.>Q| a;e~^'/^ = 2/e, so e 2 • (aj — aQ)t < 
min{e-^"*'/2 . (^^2 _ 0-2)^^ 2/ (et)}, and it follows from fl7:32|) that 

Aveo, (^,,.,)^(^o,<xo)}{e~^^^ ■ (a| - a2)t} < mm{A'e~^-''/H, 2/(et)}. (7.35) 

The claim follows by combining (17.331) - (I7.35p . 

Next, we show (a3). As the proofs are similar, we only show that corresponds 
to cTq. By (16. 2p . \aQ{(p;t) — (Tq\ ^ uniformly; by (al), it is not hard to show that 
cTQ(<y9;t) and its first two derivatives are all uniformly bounded; so all remains to 
show is that J^cro(v?;^) ~^ uniformly. Observe that for any twice differentiable 
function / and A > 0, -f (t)| < sup|,|{|/"(s)|}A, so it follows \ f'{t)\ < 

{sup|4{|/"(s)|}A + ^sup|,^^,>i|{|/(s) - /(s')|}; the claim follows by taking A = 
^sup^,,,,>,}{|/(.)-/(.')|}/sup^,}{|/"(.)|} and /(t) = aUcp; t). 

Lastly, we validate ^TM) . Write Re{ip') = Re(¥?o)+Re(r')Re((/?o)+Re(r)Re((/?'o)- 
Im(r')Im((y9o) — Im(r)Im((/?Q), and lm{ip') = lm{{pQ) + Im(r')Re(v9o) + ^t^{i^)^^{^o) + 
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Re(r')Im((y9o) + Re(r)Im(y9Q), we have 

Re{ip') ■ Im(v?) - lm{ip') ■ Re{ip) = Re(v9o) [Im(v9o) + Im(r)Re(v9o) + Re(r)Im(v9o)] 
+ Re(r')Re((^o)[Im(<^o) + Im(r)Re((^o) + Re(r)Im((^o)] + Re(r)Re((^o) [Im((^o) + 
Im(r)Re((y9o) + Re(r)Im(y9o)] — Ini(r')Im((y9o) [Im(99o) + Im(r)Re(v9o) + Re(r)Im((^o)] 
- Im(r)Im(99o)[Im(v?o) + Im(r)Re(v3o) + Re(r)Im(v9o)] - Ini(v9o)[Re(v?o) + Re(r) 
Re(v3o) - Im(r)Im(v9o)] - Im(r')Re(v?o) [Re(v3o) + Re(r)Re(v3o) - Im(r)Im(v9o)]- 
Im(r)Re((/?o)[Re(¥?o) + Re(r)Re(^o) - Im(r)Im(v?o)] - Re{r')lm{ifo)[Re{(fQ) + 
Re(r)Re(v?o) - Im(r)Im(v?o)] - Re(r)Im(v9o)[Re(v3o) + Re(r)Re(v?o) - Im(r)Im(v?o)]; 

by cancellations, this reduces to 

Re(yp'o) ■ [Im((^o) + Re(r)Im((^o)] + Re(r')Re(<^o) [Im(r)Re(<^o)] + Re(r)Re(<^'o) 
[Im(v9o) + Re(r)Im(v9o)] - Im(r')Im(v9o) [Im(v5o) + Re(r)Im(v9o)] - Im(r)Im(v9o) 
[Im(r)Re(^o)] - Im(<^'o) ■ [Re((^o) + Re(r)Re((^o)] - Im(r')Re((^o)[Re(y^o) + Re(r) 
Re(v9o)] - Im(r)Re(v?o) " [-Im(r)Im(v9o)] - Re(r')Im(v9o) [-Im(r)Im(v9o)] 
- Re(r)Im((^'o)[Re(¥;o) + Re(r)Re((/Po)]; 

by recombinations, this reduces to |1 + rp ■ [Re(</9Q)Im(y9o) — R-e('y5o)Im(v^o)] + Iv'ol^ • 
[-Im(r') + Re(r')Im(r)-Im(r')Re(r)]. Note that [Re(v9o)Im((^o) - Re((^o)Im(/o)] = 
— /iolv'oP) (I7.30I) follows directly. □ 
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